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Abstract 

Approximate Bayesian computation methods can be used to evaluate posterior distributions without 
having to calculate likelihoods. In this paper we discuss and apply an approximate Bayesian computation 
(ABC) method based on sequential Monte Carlo (SMC) to estimate parameters of dynamical models. 
We show that ABC SMC gives information about the inferability of parameters and model sensitivity 
to changes in parameters, and tends to perform better than other ABC approaches. The algorithm 
is applied to several well known biological systems, for which parameters and their credible intervals 
are inferred. Moreover, we develop ABC SMC as a tool for model selection; given a range of different 
mathematical descriptions, ABC SMC is able to choose the best model using the standard Bayesian 
model selection apparatus. 



1 Introduction 

Most dynamical system studied in the physical, life and social sciences and engineering are modelled by or- 
dinary, delay or stochastic differential equations. However, for the vast majority of systems and particularly 
for biological systems, we lack reliable information about parameters and frequently have several competing 
models for the structure of the underlying equations. Moreover, the biological experimental data is often 
scarce and incomplete, and the likelihood surfaces of large models arc complex. The analysis of such dynam- 
ical systems therefore requires new, more realistic quantitative and predictive models. Here we will develop 
novel statistical tools that allow us to analyze such data in terms of dynamical models by (i) providing 
estimates for model parameter values, and (ii) allowing us to compare the performance of different models 
in describing the overall data. 

In the last decade extensive research has been conducted on estimating the parameters of deterministic 



systems. Much attention has been given to local and global nonlinear optimization methods (Mendes & Kell 



1998 Moles et al. 2003) and generally parameter estimation has been performed by maximum likelihood 
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estimation (Timmer & Muller, 2004 Muller et al. 2004 Baker et al. 2005 Bortz & Nelson, 20061. The 



methods developed for ordinary differential equations have been extended to ordinary differential equations 



with time delays (Horbelt et al. 20021. Deterministic models have also been parameterized in a Bayesian 



framework using Bayesian hierarchical models (Putter et al. 2002 Banks et al. 2005 Huang et al. 20061 



Simulated annealing, which attempts to avoid getting trapped in local minima, is another well known opti- 



mization algorithm that has been found successful in various applications (Kirkpatrick et al. 1983 Mendes 



& Kell 1998). There are also several Monte Carlo based approaches applied to parameter estimation of 



deterministic (Battogtokh et al. 2002 Brown & Sethna 2003) and stochastic (Sisson et al. 2007) systems. 



Parameter estimation for stochastic models has been extensively explored in financial mathematics ( Johannes 



& Poison 2005) and has been applied to biological systems in a frcqucntist maximum likelihood (Rcinkcr 



et al] |2006[ ) and Bayesian ( jGolightly fc Wilkinson") |2005[ |2006| |Wilkinson[ |2006[ ) framework. 

Most commonly, model selection has been performed by likelihood ratio tests (in case of nested models) or 
the Akaike information criterion (in case of non-nested models) . Recently Bayesian methods have increasingly 



been coming into use. Vyshemirsky & Girolami ( 2008 ) have investigated different ways of computing Bayes 



factors for model selection of deterministic differential equation models, and Brown & Sethna (20031 have 



used the Bayesian information criterion. In population genetics, model selection has been performed using 



approximate Bayesian computation (ABC) in its basic rejection form (Zucknick 2004 Wilkinson 20071 and 
coupled with multinomial logistic regression ( |Beaumont 2008a; Fagundes et al. 20071. 

There is thus a wide variety of tools available for parameter estimation and, to a lesser extent, model 
selection. However, to our knowledge, no method available can be applied to all different kinds of mod- 
elling approaches (e.g. ordinary or stochastic differential equations with and without time delay) without 
substantial modification, estimate credible intervals, take incomplete or partially observed data as input, be 
employed for model selection, and reliably explore the whole parameter space without getting trapped in 
local extrema. 

In this paper we apply an ABC method based on sequential Monte Carlo (SMC) to the parameter esti- 
mation and model selection problem for dynamical models. In ABC methods the evaluation of the likelihood 



is replaced by a simulation-based procedure (Pritchard et al. 1999 Beaumont et al. 2002 Marjoram et al. 



2003 Sisson et al. 20071. We explore the information that ABC SMC gives about inferability of parameters 



and sensitivity of the model to parameter variation. Furthermore we compare the performance of ABC SMC 
to other approximate Bayesian computation methods. The method is illustrated on two simulated datasets 
(one from ecology and one from molecular systems biology), and real and simulated epidemiological datasets. 
As we will show, ABC SMC yields reliable parameter estimates with credible intervals; can be applied to 
different types of models (e.g. deterministic as well as stochastic models); is relatively computationally effi- 
cient (and easily parallelized); allows for discrimination among sets of candidate models in a formal Bayesian 
model-selection sense; and gives us an assessment of model and parameter sensitivity. 

2 Methods 

In this section we review and develop the theory underlying ABC with emphasis on applications to dynamical 
systems, before introducing a formal Bayesian model selection approach in an ABC context. 



2.1 Approximate Bayesian Computation 

Approximate Bayesian Computation methods have been conceived with the aim of inferring posterior dis- 
tributions where likelihood functions are computationally intractable or too costly to evaluate. They exploit 
the computational efficiency of modern simulation techniques by replacing calculation of the likelihood with 
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a comparison between the observed data and simulated data. 

Let 9 be a parameter vector to be estimated. Given the prior distribution tt(0), the goal is to approximate 
the posterior distribution, ir(9\x) cx f(x\9)ir(9), where f(x\0) is the likelihood of 9 given the data x. ABC 
methods have the following generic form: 

1 Sample a candidate parameter vector 9* from some proposal distribution tt(9). 

2 Simulate a dataset x* from the model described by a conditional probability distribution f(x\9*). 

3 Compare the simulated dataset, x* , with the experimental data, xq, using a distance function, d, and 

tolerance e; if d(xo,x*) < e, accept 9*. The tolerance e > is the desired level of agreement between 
Xq and x*. 

The output of an ABC algorithm is a sample of parameters from a distribution n(9\d(xo, x*) < e). If e is 
sufficiently small then the distribution ir(9\d(xo,x*) < e) will be a good approximation for the posterior 
distribution, tt(6\xq). It is often difficult to define a suitable distance function d(xo,x*) between the full 
datasets, so one may instead replace it with a distance defined on summary statistics, S(xq) and S(x*), of 
the datasets. That is, the distance function may be defined as d(xo,x*) = d'(S(xo),S(x*)), where d! is a 
distance function define on the summary statistic space. However, as we here consider values of a dynamical 
process at a set of time points, we are able to compare the datasets directly without the use of summary 
statistics. In any case, the algorithms take the same form. 



The simplest ABC algorithm is the ABC rejection sampler (Pritchard et al. 19991: 
Rl Sample 9* from tt(9). 
R2 Simulate a dataset x* from f(x\9*). 
R3 If d(xo,x*) < e, accept 9*, otherwise reject. 
R4 Return to Rl. 

The disadvantage of the ABC rejection sampler is that the acceptance rate is low when the prior distribution 
is very different from the posterior distribution. To avoid this problem an ABC method based on Markov 



chain Monte Carlo was introduced (Marjoram et al. 2003 1. The ABC MCMC algorithm proceeds as follows: 
Ml Initialize 9 i: i = 0. 

M2 Propose 9* according to a proposal distribution q(9\9,i). 
M3 Simulate a dataset x* from f(x\9*). 

M4 If d(x ,x*) < e, go to M5, otherwise set 9 l+1 = 9i and go to M6. 
M5 Set d i+ i = 9* with probability 



a = mm 1, — - — - — ; — , — - 
7r(0i)«z(0*|0,) 



and 9 i+ i = 9i with probability 1 — a. 
M6 Set i = i + 1, go to M2. 
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The outcome of this algorithm is a Markov chain with stationary distribution n(8\d(xo, x*) < e) (Mar 



joram et al. 20031. That is, ABC MCMC is guaranteed to converge to the target approximate posterior 
distribution. Notice that if the proposal distribution is symmetric, q(6i\9*) — q(9*\9i), then a depends only 
on the prior distribution. Further, if the prior is uniform, then a = 1 in M5. Potential disadvantages 
of the ABC MCMC algorithm are that the correlated nature of samples coupled with the potentially low 
acceptance probability may result in very long chains and that the chain may get stuck in regions of low 
probability for long periods of time. 

The above mentioned disadvantages of ABC rejection and ABC MCMC methods can, at least in parts, 



be avoided in ABC algorithms based on sequential Monte Carlo (SMC) methods, first developed by Sisson 



et al. (2007). In this paper we derive ABC SMC from a sequential importance sampling (SIS) algorithm 



(Del Moral et al. 2006 2008 1; see Appendix A for the derivation and B for a comparison with the algorithm 



of Sisson et al. (20071 



In ABC SMC, a number of sampled parameter values (called particles) , {9^ , . . . , 8^}, sampled from the 
prior distribution, tt(6), is propagated through a sequence of intermediate distributions, 7r(0|d(xo, x*) < gj), 
i = 1, . . . , T—l, until it represents a sample from the target distribution, Tr(9\d(xo, x*) < ey). The tolerances, 
e,, are chosen such that e\ > . . . > > 0, thus the distributions gradually evolve towards the target 
posterior. For sufficiently large numbers of particles the population approach can also in principle avoid the 
problem of getting stuck in areas of low probability encountered in ABC MCMC. The ABC SMC algorithm 
proceeds as followtj^] 

SI Initialize e\, . . . , ex- 
Set the population indicator t = 0. 

52.0 Set the particle indicator i — 1. 

52.1 If t = 0, sample 9** independently from ir(8). 

Else, sample 8* from the previous population {^_x} with weights Wt—i and perturb the particle to 

obtain 9** ~ K t (9\9*), where K t is a perturbation kernel. 

If tt(8**) = 0, return to S2.1. 

Simulate a candidate dataset x* ~ f(x\9**). 

If d(x*,xo) > et, return to S2.1. 

(i) (i) 

52.2 Set 9\ ' = 8** and calculate the weight for particle 9 t , 



1, ift = 0, 

-Mh if t > o 



If i < N set i = i + 1, go to S2.1. 

S3 Normalize the weights. 

If t < T, set t = t + 1, go to S2.0. 

Particles sampled from the previous distribution are denoted by a single asterisk, and after perturbation 
these particles are denoted by a double asterisk. Here we choose the perturbation kernel K t to be a random 
walk (uniform or Gaussian). Note that for the special case when T = 1, the ABC SMC algorithm corresponds 
to the ABC rejection algorithm. 

1 For a more general version of the algorithm, suitable especially for application to stochastic models, see Appendix A. 
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Table 1 : Interpretation of Bayes factor, adapted from 



Kass & Raftery 



(1995). 



The value of 


Evidence against m 2 


Bayes factor B 12 


(and in favour of mi) 


1 to 3 


Very weak 


3 to 20 


Positive 


20 to 150 


Strong 


> 150 


Very strong 



2.2 Model selection 

Here we introduce an ABC SMC model selection framework which employs standard concepts from Bayesian 
model selection, including Bayes factors (a comprehensive review of Bayesian model selection can be found 
in Kass fc Raftery| (1995])). Let mi and m 2 be two models; we would like to choose which model explains 



the data, x, better. The Bayes factor is defined as 

P(mi\x)/P(m 2 \x) 
P( mi )/P(m 2 ) ' {L) 

where P(rrii) is the prior and P{rn,i\x) the posterior distribution of model mi, i = 1,2. If the priors are 
uniform, then (JT|) simplifies to 

_ P(mi\x) 

B\ 2 - — p-. (2) 

P(m 2 \x) 

The Bayes factor is a summary of the evidence provided by the data in favour of one statistical model over 
another (see Table [T] for its interpretation) . 

There are several advantages of Bayesian model selection compared to traditional hypothesis testing. 
Firstly, the models being compared do not need to be nested. Secondly, Bayes factors do not only weigh the 
evidence against a hypothesis (in our case m 2 ), but can equally well provide evidence in favour of it. This 
is not the case for traditional hypothesis testing where a small p-valuc only indicates that the null model 
has insufficient explanatory power. However, one cannot conclude from a large p- value that the two models 
are equivalent, or that the null model is superior, but only that there is not enough evidence to distinguish 
between them. In other words, "failing to reject" the null hypothesis cannot be translated into "accepting" 



the null hypothesis (Cox & Hinkley 1974 Kass & Raftery 19951. Thirdly, unlike the posterior probability 



of the model, the p- value does not provide any direct interpretation of the weight of evidence (the p- value is 
not the probability that the null hypothesis is true). 

Here we approach the model selection problem by including a "model parameter" m G {1, . . . , M}, where 
M is the number of models, as an additional discrete parameter and denote the model-specific parameters 
as 9(m) = (0(m)^ 1 -', . . . , 0(m)^ fem - ) ), m = 1, . . . , M, where k m denotes the number of parameters in model m. 

In each population we start by sampling a model indicator m from the prior distribution 7r(m). For 
model m we then propose new particles by perturbing the particles from the previous population specific to 
m; this step is the same as in the parameter estimation algorithm. The weights for particles 0(m) are also 
calculated like in the parameter estimation algorithm for m. 

The ABC SMC algorithm for model selection proceeds as follow^] 

MSI Initialize e±, . . . , £t- 

Set the population indicator t = 0. 



2 In stochastic framework we again suggest using the general form of the algorithm with Bt > 1, see Appendix A. 
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MS2.0 Set the particle indicator i = 1. 

MS2.1 Sample m* from 7r(m). 

If t = 0, sample 9** from ir(6(m*)). 

If i > 0, sample 9* from the previous population {6(m*)t-i} with weights w(m*)t-i- 

Perturb the particle 9* to obtain 9** ~ K t (9\9*). 

If 7r(6»**) = 0, return to MS2.1. 

Simulate a candidate dataset x* ~ /(ac|0**,m*). 

If d(x*,2;o) > e t , return to MS2.1. 

MS2.2 Set m$ = m* and add 0** to the population of particles {0(m*)t}, and calculate its weight as 



1, 

Tree** 

£f =1 



if * = 0, 
if t > 0. 



If i < N set i = i + 1, go to MS2.1. 

MS3 For every m, normalize the weights. 
If t < T, set * = t + 1, go to MS2.0. 

The outputs of the ABC SMC algorithm are approximations of the marginal posterior distribution of the 
"model parameter" P{m\x) and the marginal posterior distributions of parameters P(9i\x, m), m = 1, . . . , M, 
i = 1, . . . , k m . Note that it can happen that a model dies out (i.e. there are no particles left that belong 
to a particular model) if it offers only a poor description of the data. In this case the sampling of particles 
continues from the remaining models only. 

Bayes factors can be obtained directly from P(m\x) using equation ([2]). However, in many cases there 



will not be a single best and most powerful/explanatory model (Stumpf & Thorne 2006). More commonly 



different models explain different parts of the data to a certain extent. One can average over these models 
to get a better inference than from a single model only. The approximation of the marginal posterior 
distribution of the model, P(m\x), which is the output of the above algorithm, can be used for Baycsian 



model averaging (Hooting et al, 1999). 



The parameter estimation for each of the models is done simultaneously with the model selection. The 
model with the highest posterior probability will typically have the greatest number of particles, thereby 
ensuring a good estimate of the posterior distribution for the parameters. However, some models are poorly 
represented in the marginal posterior distribution of m (i.e. only a small number of particles belong to these 
models) , and so the small number of particles does not provide very good estimated of the posterior distri- 
butions on parameters. Therefore, one might wish to estimate parameters for these models independently. 

We note that the ABC SMC model selection algorithm implicitly penalizes the models with large number 
of parameters; the higher the parameter dimension is, the smaller is the probability that the perturbed 
particle is accepted. 



2.3 Implementation of the algorithm 

The algorithm is implemented in C++. For the ODE solver code the 4 th order classical Runge Kutta 
algorithm from the GSL Scientific Library (Galassi 20031 is used; for the simulation of stochastic models 



we use Gillespie's algorithm (Gillespie 19771; and for the simulation of delay differential equations we 



implemented the algorithm based on the adaptive step size ODE solvers from Numerical recipes in C ( Press 



et al. 19921 extended by code handling the delay part according to Paul (1992) and Enright & Hu (1995). 



G 



3 Results 



We demonstrate the performance of the ABC algorithms using simulated data from deterministic and stochas- 
tic systems. The data points were obtained by solving the systems for some fixed parameters at chosen time 
points. The sizes of the input datasets were chosen to reflect what can typically be expected in real-world 
datasets in ecology, molecular systems biology and epidemiology The first two examples highlight the com- 
putational performance of ABC SMC, the problem of inferability of dynamical models and its relationship 
to parameter sensitivity. The third example illustrates the use of ABC SMC for model selection, which is 
then further demonstrated in an application to a real dataset. 

3.1 Parameter inference for deterministic and stochastic Lotka-Volterra model 



The first model is the Lotka-Volterra (LV) model (Lotka, 1925 Volterra[ |1926 1 describing the interaction 



between prey species, x, and predator species, y, with parameter vector 9 = (a, b): 

ax — xy, (3a) 
bxy - y. (3b) 



dx 
~dt 
dy_ 

dt 



3.1.1 Computational efficiency of ABC SMC applied to deterministic LV dynamics 

The data, {(xd,yd)}, are noisy observations of the simulated system with parameter values set at (a, b) — 
(1,1). We sample 8 data points (for each of the species) from the solution of the system for parameters (a, b) — 
(1,1) and add Gaussian noise A/"(0, (0.5) 2 ) (see Figure [ija)). Let the distance function d((xd,yd),{x,y)) 
between the data {(^[i], 2/d[i])}, i — I, ... ,8 and a simulated solution for proposed parameters, {(x[i], y[i])}, 
be the sum of squared errors: 



d((x, y), (x dl y d )) = ]T ((x[i\ - x d [z]) 2 + (y[i\ - y d {i]) 2 ) . (4) 

i 

In Appendix C we show that this distance function is, in fact, related to the conventional likelihood treatment 
of ODEs. 

The distance between our noisy data and the deterministic solution for (a, b) — (1,1) from which the 
data was generated is 4.23, so the lowest distance to be reached is expected to be close to this number and 
we choose the tolerance e accordingly. 

We first apply the ABC rejection sampler approach with e = 4.3. The prior distributions for a and b are 
taken to be uniform, a, b ~ U(—10, 10). In order to get 1, 000 accepted particles, approximately 14.1 million 
data generation steps are needed, which means that the acceptance rate (7 • 10~ 5 ) is extremely low. The 
inferred posterior distributions are shown in Figure [TJb). 

Applying the ABC MCMC scheme outlined above yields results comparable to those of ABC rejection, 
and after a careful calibration of the approach (using an adaptive Gaussian proposal distribution) we manage 
to markedly reduce the computational cost (including burn-in, we had to generate between 40, 000 and 60, 000 
simulations in order to obtain 1,000 independent particles). 

We next apply the ABC SMC approach. The prior distributions for a and b are taken to be uniform, 
a,b ~ U(— 10,10), and the perturbation kernels for both parameters are uniform, K t = aU{— 1,1), with 
a = 0.1. The number of particles in each population is N — 1000. To ensure the gradual transition between 
populations we take T = 5 populations with e = (30.0,16.0,6.0,5.0,4.3). The results are summarized in 
Table [2] and Figure [2] From the last population (population 5) it can be seen that both parameters are 
well inferred (a: median = 1.05, 95%-quantile range = [1.00, 1.12], b: median = 1.00, 95%-quantile range = 
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prey 
predator 
prey dab 
proa a lor 




0.7 0.9 1.1 1.3 



i i i i r 

0.6 1.0 1.' 



(a) 



(b) 



Figure 1: (a) Trajectories of prey and predator populations of the deterministic Lotka-Volterra system and the data 
points, (b) Parameters inferred by the ABC rejection sampler. 



Tabic 2: Cumulative number of data generation steps needed to accept 1000 particles in each population for deter- 
ministic LV dynamics. 



Population 


1 


2 


3 


4 


5 


Data gen. steps 


26,228 


36,667 


46,989 


49,271 


52,194 



[0.87, 1.11]). The outcome is virtually the same as previously obtained by the ABC rejection sampler (Figure 
[ljb)), however, there is a substantial reduction in the number of steps needed to reach this result. For this 
model the ABC SMC algorithm needs 50-times fewer data generation steps than the ABC rejection sampler, 
and about the same number of data generation steps than the adaptive ABC MCMC sampler. 
The analyses were repeated with different distance functions, such as 



d((x,y), (x d ,y d )) = ^2(\x[i] - x d [i]\ + \y[i] - y d [i] 



(5) 



and 



d{{x,y), (x d ,y d )) 



x ■ x d 



V ■ Vd 



(6) 



INIIWI 1 1 2/ 1 1 1 1 1 1 

where "•" denotes the inner product. As expected, the resulting approximations of posterior distributions 
are very similar (histograms not shown). Replacing the uniform perturbation kernel by a Gaussian kernel 
also yields the same results, but requires more simulation steps (results not shown). 



3.1.2 ABC SMC inference for stochastic LV dynamics 

Having obtained good estimates for the deterministic case, we next try to infer parameters of a stochastic 
LV model. The predator-prey process can be described by the following rate equations: 



a + X 
X + Y 
Y 



2X 
2Y 



with rate c\, 
with rate C2, 
with rate C3, 



(7a) 
(7b) 
(7c) 
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Population 1 



-2 1 2 3 -10 -5 5 10 



Population 2 



0.0 0.5 1.0 1.5 2.0 



0.0 1.0 2.0 3.0 



Population 3 




0.8 1.0 1.2 1.4 0.6 1.0 1.4 



Population 4 



i 



1.0 1.2 1.4 



Population 5 
(Approximation of 
posterior distribution) 



□Ih 



Figure 2: Histograms of populations - 5 of parameters a and b of the Lotka-Volterra system. 



where X denotes the prey population, Y the predator population and a the fixed amount of resource available 



to the prey (we fix it to a = 1). These reactions define the stochastic master equation (van Kampen 20071 

dP(x,y,t) 



dt 



= c\a(x — l)P[x — l,y, t) 

+ c 2 (x + l)(y-l)P(x+l,y-l,t) 

+ c 3 (y + l)P{x,y + l,t) 

- (ciax + c 2 xy + c 3 y)P(x,y,t). 



(8) 



The ABC approach can easily be applied to inference problems involving master equations, because there 



exists an exact simulation algorithm (Gillespie algorithm (Gillespie 1977 Wilkinson] 2006)). 

Our simulated data consists of 19 data points for each species with rates (ci, C2, C3) = (10, 0.01, 10), and 
initial conditions (Xo,Yq) = (1000, 1000). The distance function is the sum of squared errors Q and the 
SMC algorithm is run for T = 5 populations with e = (4000, 2900, 2000, 1900, 1800). Especially in laboratory 
settings, results from several replicate experiments are averaged over; here we therefore also use data averaged 
over three independent replicates. The simulated data at every run are then, like the experimental data, 
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averaged over three runs. For inference purposes the average over several runs tends to hold more information 
about the system's mean dynamics then a single stochastic run. If the experimental data consisted of one 
run only, i.e. if there were no repetitions, then the inference could in principle proceed in the same way, by 
comparing the experimental data to a single simulated run only. This would result in the lower acceptance 
rate and consequently more data generation steps to complete the inference. 

We generate N = 100 particles per population and assign the prior distributions on the parameters to be 
7r(ci) = {7(0,28), 7r(c2) = {7(0.0,0.04) and n(c3) — {7(0,28), reflecting the respective orders of magnitude of 
the simulated parameters (if a larger domain for irfa) is taken, there are no accepted instances for C2 > 0.04). 
Perturbation kernels are uniform with er Cl = a Ca = 1.0 and er C2 = 0.0025, and B t = 10. The results are 
summarized in Figure [3J 




5 10 15 20 25 0.000 0.010 0.020 5 10 15 20 25 

c, c 2 c 3 



Figure 3: Histograms of the approximated posterior distributions of parameters ci, C2 and C3 of the stochastic LV 
dynamics. 



3.2 Parameter inference for the deterministic and stochastic repressilator model 



The repressilator (Elowitz & Leibler 2000 1 is a popular toy model for gene-regulatory systems and consists of 
three genes connected in a feedback loop, where each gene transcribes the repressor protein for the next gene 
in the loop and is repressed by the previous gene. The model consists of six ordinary differential equations 
and four parameters: 

dmi 



mi 4 


a 




(9a) 


l+p% 


f- a 




- mi) 




(9b) 


m 2 4 


a 




(9c) 




f- a 




- m 2 ) 




(9d) 


m 3 4 


a 




(9e) 


l+p« 


t- a 


0(P3 


- m 3 ) 




(9f) 



dt 
dpi 

dt ~ 
dm% 

~dT ~ 
dpi 

dt ~ 
dm 3 

~dT ~ 
dp3 

dt ~ 

3.2.1 Inferability and Sensitivity in deterministic repressilator dynamics 

Let 9 = (aoj n , Pj ot) be the parameter vector. For the simulated data the initial conditions are (mi,pi, m.2,P2, ^3,^3) 
(0, 2, 0, 1, 0, 3) and the values of parameters are 9 = (1, 2, 5, 1000); for these parameters the repressilator dis- 
plays limit-cycle behaviour. We assume that only the mRNA (mi, m 2 , m 3 ) data measurements are available 
and protein concentrations are considered as the missing data. Gaussian noise A/"(0, 5 2 ) is added to the data 
points. The distance function is defined to be the squared errors. The prior parameter distributions are 
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chosen as follows: 7r(a ) = J7(-2,10), ir(n) = £/(0,10), n(/3) = C/(-5,20) and n(a) = {7(500, 2500). We 
assume the initial conditions are known. 

The results are summarized in Figures Qa) -|4^d), where we show the posterior distributions, the change 
in inter-quartile ranges of parameter estimates across the populations and scatterplots of some of the two- 
dimensional parameter combinations. Each parameter is easily inferred when the other three parameters 
are fixed (histograms not shown). When the algorithm is applied to all four parameters simultaneously, 
parameter n is inferred the quickest and has the smallest posterior variance, while parameter a is barely 
inferable and it has large credible intervals (Figures |4| a) and life)). 

We find that ABC SMC recovers the intricate link between model sensitivity to parameter changes and 
infcrability of parameters. The repressilator system is most sensitive to changes in parameter n and least to 
changes in a. Hence, data appears to contain little information about a. Thus ABC SMC provides us with 
a global parameter sensitivity analysis (Sanchez & Blower, 1997) on the fly as intermediate distributions 
are being constructed. Note that the intermediate distributions are nested in one another (as should be the 
case in SMC) (Figure Etc)). An attempt to visualize the four-dimensional posterior distributions can be 
accessed in the supplementary material where we provide an animation in which the posterior distribution 
in four-dimensional parameter space is projected onto two dimensional planes. 

The interquartile ranges and the scatterplots provide an initial impression of parameter sensitivity, how- 
ever, the first problem with scatterplots is that it is increasingly difficult to visualize the behaviour of the 
model with increasing parameter dimension. Secondly, we have to determine the sensitivity when a combi- 
nation of parameters is varied (and not just individual parameters), and this cannot be visualized via simple 
one-dimensional interquartile ranges or two-dimensional scatterplots. 



We can use principal component analysis (PCA) to quantify the sensitivity of the system ( Saltelli et al. 



2008). The output of the ABC SMC algorithm, which we are going to use for our sensitivity analysis, is the 



last population of N particles. Associated with the accepted particles is their variance-covariance matrix, S, 
of rank p 7 where p denotes the dimension of the parameter vector. The principal components (PC) are the 
eigenvectors of £ which define a set of eigen-parameters, \i — a ;i$i + ■ • • + o-ipQp- Here a.j = (a.a, ■ ■ ■ ,Q>ip) 

ijj 



is the normalized eigenvector associated with the i th eigenvalue of S, A.;, and ajj describes the projection 



of parameter 0j onto the i th eigen-parameter. PCA provides an orthogonal transformation of the original 
parameter space and the PCs can be taken to define a p-dimcnsional ellipsoid which approximates the 
population of data points (i.e. the accepted particles), and the eigenvalues specify the p corresponding radii. 
The variance of the i th PC is given by A^ and the total variance of all PCs equals X^f=i Aj = trace(S). 
Therefore, the eigenvalue Xi associated with the i th PC explains a proportion 

trace(E) (10) 

of the variation in the population of points. The smaller the Xi, the more sensitive the system is to variation 
of the eigen-parameter \i ■ PCA only yields an approximate account of sensitivity similar to what would be 
obtained by computing the Hessian around the mode of the posterior distribution. 

Figure Qd) summarizes the output of PCA. Is shows how much of a variance is explained by each PC, 
and which parameters contribute the most to these PCs. On the contrary to the interest in the first PC in 
most PCA applications, our main interest lies in the smallest PC. The last PC extends across the narrowest 
region of posterior parameter distribution, and therefore provides information on parameters to which the 
model is the most sensitive. In other words, the smaller PCs correspond to stiff parameter combinations, 
while the larger PCs may correspond to sloppy parameter combinations (Gutenkunst et al. 20071. 

The analysis reveals that the last PC mainly extends in the direction of a linear combination of parameters 
n and /3, from which we can conclude that the model is most sensitive to changes in these two parameters. 
Looking at the third component, the model is somewhat less sensitive to variation in olq. The model is 
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Figure 4: (a) Histograms of the approximate marginal posterior distributions of parameters ao, n, (3 and a of the 
deterministic repressilator model, (b) The normalized 95% inter-quantile ranges of each population. The narrower 
the interval for a given tolerance et, the more sensitive the model is to the corresponding parameter. The interquartile 
range reached in population 9 is determined by the added experimental noise. As 69 was chosen accordingly, one 
cannot proceed by lowering the tolerance further. The sharp change in the interquartile ranges, which occurs, for 
example, for parameter ao between populations and 1, can be explained by the steep gradient of the likelihood 
surface along ao- (c) The output (i.e. the accepted particles) of ABC SMC algorithm as two dimensional scatterplots. 
The particles from population 1 are coloured in yellow, particles from population 4 in black, particles from population 
7 in blue, and particles from the last population in red. Islands of particles are observed in population 4 and they 
can be explained by the multi-modality of the 4 th intermediate distribution, (d) PCA of the set of accepted particles 
(population 9). Due to dependence of PCA to the scaling of original variables the PCA was done on the correlation 
matrix. The first PC explains 70.0% of the total variance, the second 24.6%, the third 5.3% and the fourth 0.1% of 
the variance. Pie charts show the fraction of the length of PCs explained by individual parameters. 



12 



therefore the least sensitive to changes in parameter a, which is also supported by the composition of the 
2 nd PC. This outcome agrees with the information obtained from the interquartile ranges and the scatterplots. 

3.2.2 Inference of stochastic repressilator dynamics 

We next apply ABC SMC to the stochastic repressilator model. We transformed the deterministic model 
([9]) into a set of the following reactions: 



with hazard 



a 



mi 
Pi 



l+p] 

with hazard rrii, 

rrii + pi with hazard (3rrii, 
with hazard (3pi, 



(11a) 

(lib) 
(11c) 
(lid) 



where i = 1,2,3 and, correspondingly, j — 3,1,2. The stochastic process defined by these reactions can 
be simulated with Gillespie algorithm. True parameters and initial conditions correspond to those of the 
deterministic case discussed above. The data includes both mRNA and protein levels at 19 time points. 
Tolerances are chosen as e = {900, 650, 500, 450, 400}, the number of particles N = 200 and B t — 5. 
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Figure 5: (a) Histograms of the approximated posterior distributions of parameters ao, n, /3 and a of the stochastic 
repressilator model, (b) The output of the ABC SMC algorithm as two dimensional scatterplots. The particles from 
population 1 are coloured in yellow, particles from population 2 in black, particles from population 3 in blue, and 
particles from the last population in red. We notice that the projection to parameter n in population 2 is a sample 
from a multimodal intermediate distribution, while this distribution becomes unimodal from population 3 onwards. 

The inference results for comparing the average over 20 simulations with the data generated from the 
average of 20 simulations are summarized in Figures [BJa,) and[5|b). The figures show that parameters n and 
(3 get reasonably well inferred, while ao and a are harder to infer. It is clearly noticeable that parameters 
are better inferred in the deterministic case (Figure Qa)). 

3.2.3 Contrasting inferability for deterministic and stochastic dynamics 

Analyzing and comparing the results of the deterministic and stochastic repressilator dynamics shows that 
parameter sensitivity is intimately linked to inferability. If the system is insensitive to a parameter, then 



13 



this parameter will be hard (or even impossible) to infer, as varying such a parameter does not vary the 
output - which here is the approximate posterior probability - very much. In stochastic problems we may 
furthermore have the scenario where the fluctuations due to small variations in one parameter overwhelm 
signals from other parameters. This seems to be the case here, where cto is harder to infer in the stochastic 
case compared to the deterministic repressilator dynamics. 



3.3 Model selection on different SIR models 

We illustrate model selection using a range of simple models that can describe the epidemiology of infectious 
diseases. SIR models describe the spread of such disease in a population of susceptible (S), infected (I) and 



recovered (R) individuals (Anderson & May 19911, respectively. The simplest model assumes that every 
individual can only be infected once and that there is no time delay between the individual getting infected 
and their ability to infect other susceptible individuals: 

S = a-jSI-dS, (12a) 
7 = -ySI-vI-dl, (12b) 
it = vI-dR, (12c) 

where x denotes the time derivative of a:, jjj. Individuals, who are born at rate a, are susceptible; the death 
rate (irrespective of disease class, S, I or R) , is d; the infection rate is denoted by 7 and the recovery rate 
by v. 

The model can be made more realistic by adding a time delay r between the time an individual gets 
infected and the time when they become infectious, 

S = a- jSI(t - t) - dS, (13a) 
I = jSI(t-r)-vI-dI, (13b) 
R = vI-dR. (13c) 

Another way of incorporating the time delay into the model is by including a population of individuals 
in a latent (L) phase of infection; in this state they are infected but cannot yet infect others. The equations 
then become 

S = a-jSI-dS, (14a) 

L = -ySI-SL-dL, (14b) 

I = SL-vI-dl, (14c) 

R = vI-dR. (14d) 

Here S denotes the transition rate from the latent to the infective stage. 



Another extension of the basic model (12) allows the recovered individuals to become susceptible again 
(with rate e): 

S = a-jSI - dS + eR, (15a) 
7 = jSI-vI-dl, (15b) 
R = vI-(d+e)R. (15c) 

There are obviously many more ways of extending the basic model, but here we restrict ourselves to the 
four models described above. Given the same initial conditions, the outputs of all models are very similar, 
which makes it impossible to choose the right model by visual inspection of the data alone. Therefore some 
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more sophisticated, statistically based methods needs to be applied for selecting the best available model. 
Therefore we apply the ABC SMC algorithm for model selection, developed in section 2.2. We define a model 
parameter m — 1,2,3,4, representing the above models in the same order, and model-specific parameter 
vectors 9{m): 9(1) = (a,j,d,v), 9(2) = (a, 7, d, v, r), 9(3) = (a,j,d,v,S) and 9(A) — (a,j,d,v,e). 

Experimental data consists of 12 data points from each of the three groups (S, I and R). If the data 
are very noisy (Gaussian noise with standard deviation a = 1 was added to simulated data points), then 
the algorithm cannot detect a single best model, which is not surprising given the high similarity of model 
outputs. However, if intermediate noise is added (Gaussian noise with standard deviation a = 0.2), then the 
algorithm produces a posterior estimate with the most weight on the correct model. An example is shown in 
Figure|6j where experimental data was obtained from model 1 and perturbed by Gaussian noise, Af(0, (0.2) 2 ). 
Parameter inference is done simultaneously with the model selection (posterior parameter distributions not 
shown) . 
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Figure 6: Histograms show populations 1 - 9 of the model parameter m. Population 9 represents the final posterior 
marginal estimate of m. Population (discrete uniform prior distribution) is not shown. 

The Bayes factor can be calculated from the marginal posterior distribution of m which we take from 
the final population. From 1000 particles model 1 (basic model) was selected 664 times, model 2 230 times, 
model 4 106 times, and model 3 was not selected at all in the final population. Therefore, we can conclude 
from the Bayes factors 

^ - We- 6 ' 3 ' ^ 

- Sr 2 - 2 ' (18) 

that there is weak evidence in favour of model 1 compared to model 2 and positive evidence in favour of 
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Table 3: Common cold data from Tristan de Cunha collected in October 1967. 



Day 
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7 


8 


9 


10 


11 


12 


13 


14 


15 


16 


17 


18 


19 


20 


21 


I(t) 
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3G 


36 
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model 1 compared to model 4. Increasing the amount of data will, however, change the Bayes factors in 
favour of the true model. 



3.4 Application: Common cold outbreaks on Tristan da Cunha 

Tristan da Cunha is an isolated island in the Atlantic ocean with approximately 300 inhabitants, and it 
was observed that viral diseases, such as common cold, break out on the island after the arrival of ships 
from Cape Town. We use the 21-day common cold data from October 1967. The data, shown in Table [3] 



was obtained from (Shibli et al. 1971 Hammond & Tyrrell 1971) and was previously used for parameter 



inference of a stochastic model by ABC MCMC ( Cook 2008 1 . The data only provides the numbers of 



infected and recovered individuals, I(t) and R(t), whereas the size of initial susceptible population, 5(0), is 
not known. Therefore, 5(0) is an extra unknown parameter to be estimated. 

The four epidemic models from the previous section are used and because there are no births and 
deaths expected in the short period of 21 days, parameters a and d are set to 0. The tolerances are 
set to e = {100,90,80,73,70,60,50,40,30,25,20,16,15,14,13.8} and 1000 particles are used. The prior 
distributions of parameters are chosen as follows: 7 ~ £/(0,3), v ~ [7(0,3), t ~ U(— 0.5,5), 8 ~ [/(— 0.5,5), 
e ~ [7(— 0.5,5), 5(0) ~ [7(37,100) and m ~ [7(1,4), where 5(0) and m are discrete. Perturbation kernels 
are uniform, K t — er[7(— 1, 1), with <r 7 = a v = 0.3, a r — as — a e — 1.0 and as(o) — 3- 

The target and intermediate distributions of model parameters arc shown in Figure[7J The model selection 
algorithm chooses model (14 1, i.e. the model with a latent class of disease carriers, to be the most suitable 



model for describing the data, however, it is only marginally better than models ( 12 ) and ( 13 1. Therefore, to 



draw reliable conclusions from the inferred parameters one might wish to use model averaging over models 



([12]), (13) and (14). The marginal posterior distributions for parameters of model (14) are shown in Figure 
[8] However, as pointed out by Cook (2008), the estimated initial susceptible population size 5(0) is low 
compared to the whole population of the island, which suggests either that the majority of islanders were 
immune to a given strand of cold or (perhaps more plausibly) that the system is not well represented by our 
general epidemic models with homogeneous mixing. The estimated durations of the latent period, r (from 



model (13)) and 1/6 (from model ( 14 1 ) , however, are broadly in line with the established aetiology of the 
common cold ( |Fields et al. 19961. Thus within the set of candidate epidemiological models the ABC SMC 
approach selects the most plausible model and results in realistic parameter estimates. 



4 Discussion 

Our study suggests that ABC SMC is a promising tool for reliable parameter inference and model selection 
for models of dynamical systems which can be efficiently simulated. Because of its simplicity and generality 
ABC SMC, unlike most other approaches, can be applied without change in deterministic and stochastic 
context (including models with time-delay). 

The advantage of Bayesian statistical inference, in contrast to most conventional optimization algorithms 



(Moles et al. 2003), is that the whole probability distribution of the parameter can be obtained, rather than 



merely the point estimates of the optimal parameter values. Moreover in the context of hypothesis testing, 
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Figure 7: Populations of the marginal posterior distribution of m. Models 1 to 4 correspond to equations ( 12 I - ( 15 I , 
respectively. An interesting phenomena is observed in populations 2 to 12, where model 2 has the highest probability, 
in contrast to model 3 having the highest inferred probability in the last population. The most probable explanation 
for this is that a local maximum favouring model 2 has been passed on route to a global maximum of the posterior 
probability favouring model 3. 
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Figure 8: Histograms of the approximated posterior distributions of parameters 7, v, 8 and S(0) of the SIR model 



with the latent phase of infection ( 14 1 



the Bayesian perspective (Cox & Hinkley 1974 Robert & Casella 20041 has a more intuitive meaning than 



the corresponding frcquentist point of view. ABC methods share these characteristics. 

Another advantage of our ABC SMC approach is that observing the shape of intermediate and posterior 
distributions gives (without any further computational cost) information about the sensitivity of the model 
to different parameters and about the inferability of parameters. All simulations are already part of the 
parameter estimation, and can be conveniently re-used for the sensitivity analysis via scatterplots or via 
the analysis of the posterior distribution using, e.g. PCA. It can be concluded that the model is sensitive 
to parameters which are inferred quickly (in earlier populations) and which have narrow credible intervals, 
while it is less sensitive to the ones which get inferred in later populations and are not very localized by 
the posterior distribution. If the distribution does not change much between populations and resembles the 
uniform distribution from population 0, then it can be concluded that the corresponding parameter is not 
inferable given the available data. 

While parameter estimation for individual models is straightforward when a suitable number of particles 
are used, more care should be taken in model selection problems; the domains of the uniform prior distribu- 
tions should be chosen with care and acceptance rates should be closely monitored. These measures should 
prevent models being rejected in early populations solely due to inappropriately chosen (e.g. too large) prior 
domains. Apart from potentially strong dependence on the chosen prior distributions (which is also inherent 
in standard Bayesian model selection (Kass & Raftery 1995[ )) we also observe dependency of Bayes factors 



to changes in the tolerance levels, e*, and perturbation kernel variances. Therefore care needs to be taken 
when applying the ABC SMC model selection algorithm. 

Finally, we want to stress the importance of monitoring convergence in ABC SMC. There are several 
ways to see if a good posterior distribution has been obtained: we can use inter-quartile ranges or tests of 
goodness-of-fit between successive intermediate distributions. A further crucial signature is the number of 
proposals required to obtain a specified number of accepted particles. This will also impose a practical limit 
on the procedure. 

For the problems in this paper the algorithm was efficient enough. Examples here were chosen to highlight 
different aspects of ABC SMC's performance and usability. However, for the use on larger systems the 
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algorithm can be made more computationally efficient by optimizing the number of populations, the distance 
function, the number of particles and perturbation kernels (e.g. adaptive kernels). Moreover, the algorithm 
is easily parallelized. 



5 Conclusion 

We have developed a sequential Monte Carlo ABC algorithm which can be used to estimate model parameters 
(including their credible intervals) and to distinguish among a set of competing models. This approach can 
be applied to deterministic and stochastic systems in the physical, chemical and biological sciences, for 
example biochemical reaction networks or signaling networks. Because of the link between sensitivity and 
inferability ABC SMC can, however, also be applied to larger systems: critical parameters will be identified 
quickly, while the system is found to be relatively insensitive to parameters that are hard to infer. 
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A Appendix: Derivation of ABC SMC 



Here we derive the ABC SMC algorithm from the sequential importance sampling (SIS) algorithm of Del 



Moral et al. ( 2006 2008 ) and in Appendix B we show why this improves on the ABC partial rejection control 



(PRC) algorithm developed by Sisson et al. (20071. We start by briefly explaining the basics of importance 
sampling and present the SIS algorithm. 

Let 7r be the distribution we want to sample from, our target distribution. If it is impossible to sample 
from 7r directly, one can sample from a suitable proposal distribution, ry, and use importance sampling weights 
to approximate w. The Monte Carlo approximation of rj is 

N 



i=X 



where X^ % ' ~ lld rj and 5 Xo (x) is the Dirac delta function, defined by 

f(x)S X0 (x)dx = /(x ). 

If we assume 7r(x) > => f](x) > 0, then the target distribution tt can be approximated by 

1 N 

M*) = ^£™(* (i) )<W*), 

i=l 

with importance weights defined as 



w(x) 



7r(x) 

T)(x) ' 



In other words, to get a sample from the target distribution ir, one can instead sample from the proposal 
distribution, rj, and weight the samples by importance weights, w. 
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In sequential importance sampling one reaches the target distribution ttt through a series of intermediate 
distributions, 7T£, t = 1, . . . , T — 1. If it is hard to sample from these distributions one can use the idea of 
importance sampling described above to sample from a series of proposal distributions r\ t and weight the 
obtained samples by importance weights 



wt(x t ) 

In SIS the proposal distributions arc defined as 



Vt(x t ) ' 



r}t-i(x t -i)Kt(x t -i,xt)dx t -i, 



(19) 



(20) 



where r\t-\ is the previous proposal distribution and K t is a Markov kernel. 
In summary we can write the SIS algorithm as follows: 



Initialization Set t 



For i = 1, 



, TV draw X 

Ai) 



(i) 



Evaluate w\(X\ ) using (19 1 and normalize 



Iteration Set t = t + 1, if t = T + 1 stop. 

For i = 1, . . . ,N draw X t ~ K t {xf\, .). 



Evaluate Wt{x[ 1 ^) using (19 1 with i]t(x t ) from (20 1 and normalize 



To apply SIS, one needs to define the intermediate and the proposal distributions. Taking this SIS 
framework as a base, we now define ABC SMC to be a special case of the SIS algorithm, where we choose 
the intermediate and proposal distributions in an ABC fashion, as follows. The intermediate distributions 
are defined as 

n(x) = ^-f^l(d(D ,D (b) (x))<e t ), 

* 6=1 

where tt(x) denotes the prior distribution and Dm\, . . . , D(B t ) are B t datasets generated for a fixed parameter 
x, D(u\ ~ p(D\x). l(x) is an indicator function and e t is the tolerance required from particles contributing 
to the intermediate distribution t. This allows us to define b t (x) = Ylb=i ■"■ {d(Do, D^)(x)) < £t). We define 
the first proposal distribution to equal the prior distribution, r/i = ir. The proposal distribution at time t 
(t = 2, . . . ,T), r]t, is defined as the perturbed intermediate distribution at time t — 1, 7r t _ 1; such that for 
every sample, x, from this distribution we have 

(i) bt(x) > (in other words, particle x is accepted at least one out of B t times), and 

(ii) 7r(x) > (in order for a condition ir t (x) > r)t(x) > to be satisfied): 



r} t (x f ) = ±(n t (xt) >0)t{b t {x t ) >0) / irt-iQct^Ktfa-uxJdxt-i (21) 
= 1 (ir t (x t ) > 0) 1 (bt(x t ) > 0) / r}t-i(x t -i)wt-i(x t -i)Kt(xt-i,x t )dx t -i, 



where K t denotes the perturbation kernel (random walk around the particle). 
To calculate the weights defined by 



»><" " ifip (22 > 



we need to find an appropriate way of evaluating rj t (xt) defined in equation (21 1. This can be achieved 
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through the standard Monte Carlo approximation, 

= iW,)>o)i(M Il) >o)| lH (,_ 1 )i( i ( lHlI1 )^ 1 

» l(7r t (ar t )>0)l(6 t (x t )>0)-i : ^ K^x^xt), 



N 

a: t-i~ 7r t-i 



where N denotes the number of particles and {xj^}, i = 1, . . . ,N, are all the particles from the intermediate 
distribution Wt-i- The unnormalized weights d22| can then be calculated as 



7r(x t )&t(a; t ) 

where ir is the prior distribution, b t (x t ) — Ylf—i 1 {d(D , D^(x t )) < et), and Dry, 6=1,..., B t , are the B t 
simulated datasets for a given parameter x±. For B t = 1 the weights become 

7r(#t) 



w*(a; t ) 



for all accepted particles Xt- 

The ABC SMC algorithm can be written as follows: 

SI Initialize t\, . . . , ex- 
Set the population indicator t = 0. 

52.0 Set the particle indicator i = 1. 

52.1 If £ = 0, sample x** independently from 7r(a;). 

If t > 0, sample x* from the previous population {xj!^} with weights and perturb the particle 

to obtain x** ~ i^t(x|a;*), where i^t is a perturbation kernel. 
If ir(x**) = 0, return to S2.1. 

Simulate a candidate dataset D^(x**) ~ /(£)|x**) times (6=1,..., B t ) and calculate 6t(x**). 
If b t (x**) = 0, return to S2.1. 

52.2 Set = x** and calculate the weight for particle x[ l \ 



&t(sf), if * = 0, 



If i < iV set i = i + 1, go to S2.1. 

S3 Normalize the weights. 

If t < T, set * = t + 1, go to S2.0. 

When applying ABC SMC to deterministic systems we take B t — 1, i.e. we simulate the dataset for each 
particle only once. 

B Appendix: Comparison of ABC SMC algorithm with the ABC 
PRC of Sisson et al. 

In this section we contrast the ABC SMC algorithm, which we developed above, and the ABC PRC algorithm 



of Sisson et al. (20071. The algorithms are very similar in principle, and the main difference is that we base 
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ABC SMC on a SIS framework whereas Sisson et al. use a SMC sampler as a basis for ABC PRC, where the 
weight calculation is done through the use of a backward kernel. Both algorithms are explained in detail in 



Del Moral et al. ( 2006 2008 1. The disadvantage of the SMC sampler is that it is impossible to use an optimal 



backward kernel and it is hard to choose a good one. Sisson et al. choose a backward kernel that is equal to 
the forward kernel, which we suggest can be a poor choice]^] While this highly simplifies the algorithm since 
in the case of a uniform prior distribution all the weights become equal, the resulting posterior distributions 
can result in bad approximations to the true posterior. In particular, using equal weights can profoundly 
affect the posterior credible interval^ 

Here we compare the outputs of both algorithms using the toy example from |Sisson et al. ( 2007| ). The goal 
is to generate samples from a mixture of two normal distributions, ^</> (0, jgg) + ^</> (0, 1) , where </>(//, <r 2 ) is 
the density function of a N(fi, a 2 ) random variable (Figure [9j a)). Sisson et al. approximate the distribution 
well by using three populations with ei = 2, €2 = 0.5 and £3 = 0.025, respectively, starting from a uniform 
prior distribution. However, ABC PRC would perform poorly if they had used more populations. In 
Figures [9jb) and[9jc) we show how the variance of the approximated posterior distribution changes through 
populations. We use 100 particles and average over 30 runs of the algorithm, with tolerance schedule e = 
{2.0, 1.5, 1.0, 0.75, 0.5, 0.2, 0.1, 0.075, 0.05, 0.03, 0.025}. The red line shows the variance from ABC SMC, the 
blue line the variance from the ABC PRC, and the green line the variance from the ABC rejection algorithm. 
In case when the perturbation is relatively small, we see that the variance resulting from improperly weighted 
particles in ABC PRC is too small (blue line in Figure [9jb)), while the variance resulting from ABC SMC 
is ultimately comparable to the variance obtained by ABC rejection algorithm (green line). 

However, we also notice that using many populations and a too small perturbation, e.g. uniform pertur- 
bation K t — aU{— 1, 1) with a — 0.15, the approximation is not very good (results not shown). One therefore 
needs to be careful to use a sufficiently large perturbation, irrespective of the weighting scheme. However, 
at the moment there are no strict guidelines of how best to do this and we decide on the perturbation kernel 
based on experience and experimentation. 

For the one-dimensional deterministic case with a uniform prior distribution and uniform perturbation, 
one can show that all the weights will be equal when a of the uniform kernel is equal to the whole parameter 
range. In this case the non- weighted ABC PRC yields the same r esults as ABC SMC (Figure ^c)). However, 
when working with complex high-dimensional systems it simply is not feasible to work with only a small 
number of populations, or a very big perturbation kernel spreading over the whole parameter range. Therefore 
we conclude that it is important to use the ABC SMC algorithm because of its correct weighting. 

Another difference between the two algorithms is that ABC PRC includes the resampling step in order 



to maintain a sufficiently large effective sample size (ESS, (Del Moral et al. 2006 Liu & Chen 1998 Liu 



2001 )). In contrast to non- ABC SIS or SMC frameworks, ABC algorithms, by sampling with weights in step 



S2.1 prior to perturbation, we suggest, do not require this additional resampling step. 

We note that in SIS the evaluation of weights after each population is computationally more costly, i.e. 
0(N 2 ), than the calculation of weights in SMC with a backward kernel, which is O(N), where N denotes 
the number of particles. However, this cost is negligible in the ABC case, because the vast proportion of 
computational time in ABC is spent on simulating the model repeatedly. While thousands or millions of 
simulations are needed, the weights only need to be updated after every population. Therefore we can easily 
afford spending a bit more computational time in order to use the correctly weighted version and circumvent 
the issues related to backward kernel choice. 



3 See also http://web.maths.unsw.edu.au/ scott /papers / paper _smcabc_optim al.pdf 
4 This, including the experiment below, was suggested by Beaumont (2008b . 
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Figure 9: (a) The probability density function of a mixture of two normal distributions, ^(j> (0, j^) + §0(0, 1), 
taken as a toy example used for comparison of ABC SMC with ABC PRC. (b)-(c) Plots show how the variance of 
approximated intermediate distributions Tv t changes with populations (t = 1, . . . , 10 on x-axis). The red curves plot 
the variance of ABC SMC population and the blue curves the variance of non-weighted ABC PRC populations. The 
perturbation kernel in both algorithms is uniform, Kt = aU(—l, 1). In (b) a — 1.5 and this results in poor estimation 
of the posterior variance with ABC PRC algorithm. In (c) a is updated in each population so that it expands over 
the whole population range. Such a is big enough for non-weighted ABC PRC to perform equally well as ABC SMC. 



C Appendix: ABC and full likelihood for ODE systems 

We would like to note that the approximate Bayesian computation algorithms with distance function chosen 
to be squared errors is equivalent to the maximum likelihood problem for a dynamical systems for which 
Gaussian errors are assumed. In other words, minimizing the distance function 

n m 

EI>tf-5i(*i>*)) 2 ' ( 23 ) 
»=1 j=l 

where g(t,9) S R rn is the solution of the m-dimensional dynamical system and D = (xi)u—i n \, %i € R m , 
are (m-dimensional) data points measured at times ti,...,t n , is equivalent to maximizing the likelihood 
function 



1 

TT - 



e 



-h i (x t -g{t i ,8)) T ?,- 1 {x i ~g{t i ,e)) 



l\ (27T)-/2|E|i 

where £ is diagonal and its entries equal. This can straightforwardly be generalized for the case of multiple 
time series measurements. Thus ABC is for deterministic ODE systems closely related to standard Bayesian 
inference where the likelihood is evaluated. This is because ODEs are not based on a probability model, and 
likelihoods are therefore generally defined in a non-linear regression context (such as assuming that data is 



normally distributed around the deterministic solution), see e.g. Timmer fc Muller] ( 2004 ) and Vyshemirsky 



& Girolami (20081 
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Supplementary material 



Video of prior, intermediate and posterior distributions of deterministic repres- 
silator model 

The video shows the particles from the 4-dimensional distributions corresponding to the (a) prior distribu- 
tion, (b) 4 th intermediate distribution, (c) approximate posterior distribution obtained for the deterministic 



repressilator model. The video was produced using the GGobi (Swayne et al. 20031 and RGGobi (Lang 



et al. 2006 1 packages from the R statistical environment (www.R-project.org) 



Datasets 

Simulated datasets used in examples in this paper are attached as a supplementary material. 



References 

Anderson, R. M. & May, R. M. 1991 Infectious Diseases of Humans: Dynamics and Control Oxford Univer- 
sity Press. 

Baker, C, Bocharov, G., Ford, J., Lumb, P. et al. 2005 Computational approaches to parameter estimation 
and model selection in immunology J Comput Appl Math 184, 50-76 doi:10.1016/j.cam.2005.02.003. 

Banks, H., Grove, S., Hu, S. & Ma, Y. 2005 A hierarchical Bayesian approach for parameter estimation in 
HIV models Inverse Problems. 

Battogtokh, D., Asch, D. K., Case, M. E., Arnold, J. & Schuttler, H.-B. 2002 An ensemble method for 
identifying regulatory circuits with special reference to the qa gene cluster of Neurospora crassa Proc Natl 
Acad Set USA 99, 16904-9 doi:10.1073/pnas.262658899. 

Beaumont, M. 2008a Simulations, genetics and human prehistory McDonald Institute Monographs, Univer- 
sity of Cambridge, edited by S. Matsumura, P. Forester and C. Renfrew. 

Beaumont, M. A. 2008b A note on the ABC-PRC algorithm of Sissons et al. (2007) arXiv stat.CO. 

Beaumont, M. A., Zhang, W. & Balding, D. J. 2002 Approximate bayesian computation in population 
genetics Genetics 162, 2025-35. 

Bortz, D. M. & Nelson, P. W. 2006 Model selection and mixed-effects modeling of HIV infection dynamics 
Bull Math Biol 68, 2005-25 doi:10.1007/sll538-006-9084-x. 

Brown, K. S. & Sethna, J. P. 2003 Statistical mechanical approaches to models with many poorly known 
parameters Physical review E 68. 

Cook, A. R. 2008 Two novel alternatives to data augmentation in inference for epidemics In preparation. 

Cox, R. D. & Hinkley, D. V. 1974 Theoretical Statistics Chapman & Hall/CRC, New York. 

Del Moral, P., Doucet, A. & Jasra, A. 2006 Sequential Monte Carlo samplers J. Royal Statist. Soc. B. 

Del Moral, P., Doucet, A. & Jasra, A. 2008 Sequential Monte Carlo for Bayesian computation in press. 

Elowitz, M. B. & Leibler, S. 2000 A synthetic oscillatory network of transcriptional regulators Nature 403, 
335-8 doi:10. 1038/35002125. 



24 



Enright, W. & Hu, M. 1995 Interpolating Runge-Kutta methods for vanishing delay differential equations 
Computing 55, 223-236. 

Fagundes, N., Ray, N., Beaumont, M., Neuenschwander, S. et al. 2007 Statistical evaluation of alternative 
models of human evolution Proc Natl Acad Sci USA doi:10.1073/pnas.0708280104. 

Fields, B. N., Knipe, D. M. & Howley, P. M. 1996 Fundamental Virology Lippincott Williams & Wilkins. 

Galassi, M. 2003 GNU scientific library reference manual Cambridge University Press, 2nd edition. 

Gillespie, D. 1977 Exact stochastic simulation of coupled chemical reactions The Journal of Physical Chem- 
istry. 

Golightly, A. & Wilkinson, D. 2005 Bayesian inference for stochastic kinetic models using a diffusion ap- 
proximation Biometrics. 

Golightly, A. & Wilkinson, D. 2006 Bayesian sequential inference for stochastic kinetic biochemical network 
models J Comput Biol. 

Gutenkunst, R., Waterfall, J., Casey, F., Brown, K., Myers, C. & Sethna, J. 2007 Universally sloppy param- 
eter sensitivities in systems biology models PLoS Comput Biol 3, el89 doi: 10. 1371/journal.pcbi. 0030189. 

Hammond, B. J. & Tyrrell, D. A. 1971 A mathematical model of common-cold epidemics on Tristan da 
Cunha The Journal of hygiene 69, 423-33. 

Hooting, J., Madigan, D., Raftery, A. & Volinsky, C. 1999 Bayesian model averaging: a tutorial Statist. Sci. 

Horbclt, W., Timmcr, J. & Voss, H. 2002 Parameter estimation in nonlinear delayed feedback systems from 
noisy data Phys Lett A 299, 513-521. 

Huang, Y., Liu, D. & Wu, H. 2006 Hierarchical Bayesian methods for estimation of parameters in a longitu- 
dinal HIV dynamic system Biometrics 62, 413-23 doi:10.1111/j.l541-0420.2005.00447.x. 

Johannes, M. & Poison, N. 2005 MCMC methods for continuous-time financial econometrics Handbook of 
Financial Econometrics (eds. Y. Ait-Sahalia and L. Hansen), Elsevier Science Ltd.. 

van Kampen, N. G. 2007 Stochastic Processes in Physics and Chemistry North-Holland, 3rd edition. 

Kass, R. & Raftery, A. 1995 Bayes factors Journal of the American Statistical Association. 

Kirkpatrick, S., Gelatt, C. & Vecchi, M. 1983 Optimization by simulated annealing Science 220, 671-680 
doi: 10. 1 126 /science.220.4598.671 . 

Lang, D. T., Swaync, D., Wickham, H. & Lawrence, M. 2006 rggobi: An interface between R and GGobi 
http:/ / www. R-project. org. 

Liu, J. S. 2001 Monte Carlo Strategies in Scientific Computing Springer. 

Liu, J. S. & Chen, R. 1998 Sequential Monte Carlo methods for dynamic systems Journal of the American 
Statistical Association. 

Lotka, A. J. 1925 Elements of Physical Biology Baltimore: Williams & Wilkins Co. 

Marjoram, P., Molitor, J., Plagnol, V. & Tavare, S. 2003 Markov chain Monte Carlo without likelihoods 
Proc Natl Acad Sci USA 100, 15324-8 doi:10.1073/pnas.0306899100. 



25 



Mendes, P. & Kcll, D. 1998 Non-linear optimization of biochemical pathways: applications to metabolic 
engineering and parameter estimation Bioinformatics 14, 869-83. 

Moles, C, Mendes, P. & Banga, J. 2003 Parameter estimation in biochemical pathways: a comparison of 
global optimization methods Genome Res 13, 2467-74 doi:10.1101/gr.l262503. 

Mullcr, T. G., Faller, D., Timmer, J., Swameye, I., Sandra, O. & Klingmiiller, U. 2004 Tests for cycling in 
a signalling pathway Journal of the Royal Statistical Society Series C 53, 557. 

Paul, C. 1992 Developing a delay differential equation solver Applied Numerical Mathematics 9, 403-414. 

Press, W. H., Tcukolsky, S. A., Vetterling, W. T. & Flannery, B. P. 1992 Numerical recipes in C: the art of 
scientific computing (2nd edition) Cambridge University Press. 

Pritchard, J., Sciclstad, M. T., Perez-Lezaun, A. & Feldman, M. W. 1999 Population growth of human Y 
chromosomes: a study of Y chromosome microsatellites Molecular Biology and Evolution 16, 1791-1798. 

Putter, H., Hcistcrkamp, S. H., Lange, J. M. A. & de Wolf, F. 2002 A Bayesian approach to parameter 
estimation in HIV dynamical models Statistics in medicine 21, 2199-214 doi:10.1002/sim.l211. 

Rcinkcr, S., Altman, R. & Timmer, J. 2006 Parameter estimation in stochastic biochemical reactions IEE 
Proc.-Syst. Biol. 

Robert, C. P. & Casella, G. 2004 Monte Carlo Statistical Methods Springer, 2nd edition. 

Saltclli, A., Ratto, M., Andres, T., Campolongo, F. et al. 2008 Global sensitivity analysis: The primer John 
Wiley and Sons. 

Sanchez, M. & Blower, S. 1997 Uncertainty and sensitivity analysis of the basic reproductive rate: Tubercu- 
losis as an example American Journal of Epidemiology 145, 1127-1137. 

Shibli, M., Gooch, S., Lewis, H. E. & Tyrrell, D. A. 1971 Common colds on Tristan da Cunha The Journal 
of hygiene 69, 255. 

Sisson, S. A., Fan, Y. & Tanaka, M. M. 2007 Sequential Monte Carlo without likelihoods Proc Natl Acad Sci 
USA 104, 1760-5 doi:10.1073/pnas.0607208104. 

Stumpf, M. & Thorne, T. 2006 Multi-model inference of network properties from incomplete data Journal 
of Integrative Bioinformatics. 

Swaync, D., Lang, D., Buja, A. & Cook, D. 2003 Ggobi: evolving from xgobi into an extensible framework 
for interactive data visualization Computational Statistics and Data Analysis. 

Timmer, J. & Muller, T. 2004 Modeling the nonlinear dynamics of cellular signal transduction Int J Bifurcat 
Chaos 14, 2069-2079. 

Volterra, V. 1926 Variazioni e fluttuazioni del numero d'individui in specie animali conviventi Mem. R. Acad. 
Naz. dei Lincei 2, 31-113. 

Vyshemirsky, V. & Girolami, M. A. 2008 Bayesian ranking of biochemical system models Bioinformatics 24, 
833-9 doi : 1 . 1093 /bioinformatics /btm607. 

Wilkinson, D. J. 2006 Stochastic Modelling for Systems Biology Chapman & Hall/CRC. 

Wilkinson, R. D. 2007 Bayesian inference of primate divergence times PhD thesis, University of Cambridge. 

Zucknick, M. 2004 Approximate Bayesian Computation in population genetics and genomics MSc thesis, 
Imperial College London. 



26 



